Goto

Collaborating Authors

 icml 2018



End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF: A Reproducibility Study

Ganesh, Anirudh, Reddy, Jayavardhan

arXiv.org Artificial Intelligence

We present a reproducibility study of the state-of-the-art neural architecture for sequence labeling proposed by Ma and Hovy (2016)\cite{ma2016end}. The original BiLSTM-CNN-CRF model combines character-level representations via Convolutional Neural Networks (CNNs), word-level context modeling through Bi-directional Long Short-Term Memory networks (BiLSTMs), and structured prediction using Conditional Random Fields (CRFs). This end-to-end approach eliminates the need for hand-crafted features while achieving excellent performance on named entity recognition (NER) and part-of-speech (POS) tagging tasks. Our implementation successfully reproduces the key results, achieving 91.18\% F1-score on CoNLL-2003 NER and demonstrating the model's effectiveness across sequence labeling tasks. We provide a detailed analysis of the architecture components and release an open-source PyTorch implementation to facilitate further research.



Reviews: Graph-Based Semi-Supervised Learning with Non-ignorable Non-response

Neural Information Processing Systems

I think this claim should be motivated well enough, at least to me it was not entirely clear why this is important. If authors can provide some scenarios which can help understand the claim, it will be beneficial for the readers. However, the experimental analysis can be improved. One of the baselines the authors consider is "SM", but do not mention the paper in which it is proposed. They should produce results on multiple real world datasets. However, authors do not compare the proposed model with state of the art graph based SSL methods like GAT (Velickovic et al., ICML 2018) etc. [Velickovic et al., ICML 2018] Graph Attention Networks 4. Minor points: -- "vertexes" - "vertices" -- Not sure if using gradient descent qualifies as a contribution.


Reviews: GILBO: One Metric to Measure Them All

Neural Information Processing Systems

Overall I think is a very good paper and it is one of the better papers I've seen addressing evaluating GANs. I myself are fairly skeptical of FID and have seen other works criticizing that approach, and this work sheds some light on the situation. I think anyone who follows this work would be better informed than work that introduced inception or FID in how to evaluate GANs. That said, there is some missing discussion or comparison to related work (notably mutual information neural estimation (MINE) by Belghazi et al, 2018) as well as some discussion related to the inductive bias and boundedness of their estimator. I'd like to see a discussion of these things.


Enabling Reproducibility in Machine Learning MLTrain@RML (ICML 2018) – mltrain

#artificialintelligence

In this tutorial, we will demonstrate how to implement the state of the art End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF paper for Named Entity Recognition using Pytorch. The main aim of the tutorial is to make the audience comfortable with Pytorch using this tutorial and give a step-by-step walkthrough of the Bi-LSTM-CNN-CRF architecture for Named-Entity-Recognition.


ICML 2018 Announces Best Paper Awards – SyncedReview – Medium

#artificialintelligence

The International Conference on Machine Learning (ICML) 2018 will be held July 10–15 in Stockholm, Sweden. Yesterday, from more than 600 accepted papers, the prestigious conference announced its Best Paper Awards. Two papers shared top honours. Researchers Anish Athalye of MIT and Nicholas Carlini and David Wagner of UC Berkeley's Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples; and Delayed Impact of Fair Machine Learning, from a UC Berkeley research group led by Lydia T. Liu and Sarah Dean. The Best Paper Runner Up Awards go to Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices, from Professor Zengfeng Huang of Fudan University; The Mechanics of n-Player Differentiable Games from DeepMind and University of Oxford's David Balduzzi and Sebastien Racaiere, James Martens, Jakob Foerster, Karl Tuyls and Thore Graepel; and Fairness Without Demographics in Repeated Loss Minimization, from a Stanford research group including Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang.


Investigating Human Priors for Playing Video Games

Dubey, Rachit, Agrawal, Pulkit, Pathak, Deepak, Griffiths, Thomas L., Efros, Alexei A.

arXiv.org Artificial Intelligence

What makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making. This paper investigates the role of human priors for solving video games. Given a sample game, we conduct a series of ablation studies to quantify the importance of various priors on human performance. We do this by modifying the video game environment to systematically mask different types of visual information that could be used by humans as priors. We find that removal of some prior knowledge causes a drastic degradation in the speed with which human players solve the game, e.g. from 2 minutes to over 20 minutes. Furthermore, our results indicate that general priors, such as the importance of objects and visual consistency, are critical for efficient game-play.


Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

Lamb, Alex, Binas, Jonathan, Goyal, Anirudh, Serdyuk, Dmitriy, Subramanian, Sandeep, Mitliagkas, Ioannis, Bengio, Yoshua

arXiv.org Machine Learning

Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both black-box and white-box threat models; (ii) suggest that our improvements are not primarily due to the gradient masking problem and (iii) show the advantage of doing this fortification in the hidden layers instead of the input space.